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A  retrospective  investigation  was  performed  to  evaluate  whole-genome  sequencing  as  a  benchmark  for  comparing  molecular 
subtyping  methods  for  Salmonella  enterica  serotype  Enteritidis  and  survey  the  population  structure  of  commonly  encountered 
S.  enterica  serotype  Enteritidis  outbreak  isolates  in  the  United  States.  A  total  of  52  S.  enterica  serotype  Enteritidis  isolates  repre¬ 
senting  16  major  outbreaks  and  three  sporadic  cases  collected  between  2001  and  2012  were  sequenced  and  subjected  to  subtyp¬ 
ing  by  four  different  methods:  (i)  whole-genome  single-nucleotide-polymorphism  typing  (WGST),  (ii)  multiple-locus  variable- 
number  tandem-repeat  (VNTR)  analysis  (MLVA),  (iii)  clustered  regularly  interspaced  short  palindromic  repeats  combined  with 
multi-virulence-locus  sequence  typing  (CRISPR-MVLST),  and  (iv)  pulsed-field  gel  electrophoresis  (PFGE).  WGST  resolved  all 
outbreak  clusters  and  provided  useful  robust  phylogenetic  inference  results  with  high  epidemiological  correlation.  While  both 
MLVA  and  CRISPR-MVLST  yielded  higher  discriminatory  power  than  PFGE,  MLVA  outperformed  the  other  methods  in  delin¬ 
eating  outbreak  clusters  whereas  CRISPR-MVLST  showed  the  potential  to  trace  major  lineages  and  ecological  origins  of  S.  en¬ 
terica  serotype  Enteritidis.  Our  results  suggested  that  whole-genome  sequencing  makes  a  viable  platform  for  the  evaluation  and 
benchmarking  of  molecular  subtyping  methods. 


Salmonella  enterica  is  currently  the  most  common  bacterial 
foodborne  pathogen  in  the  United  States,  causing  over  1  mil¬ 
lion  cases  of  illnesses  annually,  including  approximately  20,000 
hospitalizations  and  400  deaths  ( 1 ) .  Serotyping  is  commonly  used 
to  subtype  strains  below  the  species  level  for  epidemiologic  pur¬ 
poses.  Salmonella  enterica  serotype  Enteritidis  was  the  serotype 
most  commonly  linked  to  foodborne  outbreaks  between  1998  and 
2008  in  the  United  States,  with  shell  eggs  being  the  major  vehicle 
for  foodborne  transmission  (2).  In  recent  years,  S.  enterica  sero¬ 
type  Enteritidis  was  also  found  to  cause  multistate  outbreaks  as¬ 
sociated  with  other  foods  such  as  ground  beef  (2012),  Turkish 
pine  nuts  (2011),  and  alfalfa  and  spicy  sprouts  (2011),  in  addi¬ 
tion  to  shelled  eggs  (2010)  (3). 

During  outbreak  investigations,  it  is  critical  to  employ  subtyp¬ 
ing  methods  capable  of  distinguishing  outbreak  isolates  from  ep- 
idemiologically  distinct  but  genetically  related  bacterial  strains. 
Most  S.  enterica  serotype  Enteritidis  isolates  have  been  shown  to 
be  genetically  homogeneous,  making  it  difficult  for  conventional 
subtyping  methods  such  as  pulsed-field  gel  electrophoresis 
(PFGE),  the  current  gold  standard  for  strain-level  Salmonella  sub¬ 
typing,  to  discriminate  between  strains  (4,  5).  Among  the  S.  en¬ 
terica  serotype  Enteritidis  isolates  reported  to  PulseNet  (6),  ap¬ 
proximately  45%  display  a  single  PFGE  pattern  using  Xbal 
(JEGX01.0004),  rendering  PFGE  ineffective  in  some  foodborne 
outbreak  investigations.  One  strategy  to  improve  subtype  resolu¬ 
tion  is  to  target  hypervariable  regions  (i.e.,  regions  of  the  bacterial 
chromosome  with  less  genetic  stability)  in  the  bacterial  genome  to 
produce  sufficient  polymorphism  for  strain  differentiation.  Two 
such  methods  have  been  developed  and  evaluated  with  S.  enterica 


serotype  Enteritidis  isolates.  Multilocus  variable-number  tan¬ 
dem-repeat  analysis  (MLVA)  utilizes  the  polymorphism  in  the 
copy  numbers  of  tandemly  repeated  sequences  at  multiple  loci  in 
the  S.  enterica  serotype  Enteritidis  genome.  It  provides  higher  res¬ 
olution  than  PFGE  (7,  8)  and  has  become  a  supplementary  sub¬ 
typing  technique  for  surveillance  and  investigation  of  S.  enterica 
serotype  Enteritidis  outbreaks  by  PulseNet.  Analysis  using  clus¬ 
tered  regularly  interspaced  short  palindromic  repeats  (CRISPRs) 
combined  with  multi-virulence-locus  sequence  typing  (desig¬ 
nated  CRISPR-MVLST)  takes  advantage  of  combined  sequence 
variations  in  the  spacer  regions  of  the  two  CRISPR  loci  in  Salmo¬ 
nella  and  two  virulence  genes  ( fimH  and  sseL)  (9).  This  recently 
proposed  subtyping  scheme  allowed  better  discrimination  of  S. 
enterica  serotype  Enteritidis  isolates  than  PFGE  (10). 
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Common  criteria  to  evaluate  the  efficacy  of  subtyping  methods 
include  discriminatory  power  and  clustering  concordance  with 
epidemiological  data.  Both  MLVA  and  CRISPR-MVLST  have 
been  assessed  in  Salmonella  based  on  these  criteria  (7,  8,  10-13). 
Evaluation  of  subtyping  methods  is  often  conducted  through 
comparisons  with  PFGE;  however,  PFGE  is  not  sufficiently  dis¬ 
criminatory  against  clonal  organisms  such  as  S.  enterica  serotype 
Enteritidis  and  its  utility  as  a  benchmark  for  other  subtyping  tech¬ 
niques  can  be  compromised.  In  recognition  of  this,  multiple  en¬ 
zymes  have  been  used  as  part  of  a  PFGE  scheme  to  improve  dis¬ 
crimination  (5).  Nevertheless,  the  lack  of  diversity  in  PFGE 
patterns,  as  in  the  case  of  S.  enterica  serotype  Enteritidis  subtyping, 
may  prevent  the  differentiation  of  epidemiologically  unrelated 
isolates. 

Powered  by  whole-genome-sequencing  (WGS)  technologies, 
recent  implementations  of  whole-genome  single-nucleotide- 
polymorphism  (SNP)  typing  (WGST)  have  led  to  substantial  im¬ 
provements  of  both  molecular  sub  typing  and  phylogenetic  anal¬ 
yses,  particularly  for  genetically  homogenous  bacterial  pathogens 
such  as  S.  enterica  serotype  Enteritidis  (14,  15).  A  recent  WGS- 
based  survey  of  S.  enterica  serotype  Enteritidis  isolates  resolved  the 
commonly  circulating  S.  enterica  serotype  Enteritidis  populations 
in  the  United  States  into  five  major  genetic  lineages,  revealing 
potential  patterns  in  their  geographical  and  epidemiological  dis¬ 
tribution  (15). 

WGS  allows  discovery  of  SNPs  across  entire  bacterial  genomes, 
thereby  providing  superior  subtyping  resolution  and  phylogenetic 
accuracy,  which  can  be  utilized  for  benchmarking  other  subtyping 
methods.  In  this  study,  we  assembled  a  cohort  of  52  S.  enterica 
serotype  Enteritidis  isolates  from  1 5  major  foodborne  disease  out¬ 
breaks  and  three  sporadic  cases  in  the  United  States  and  1  out¬ 
break  in  Mauritius  between  2001  and  2012.  A  retrospective  inves¬ 
tigation  of  these  isolates  was  performed  with  a  combination  of 
WGST,  MLVA,  CRISPR-MVLST,  and  PFGE  analyses  to  compare 
their  respective  performances  in  delineating  each  individual  out¬ 
break  under  the  guidance  of  the  recently  proposed  phylogenetic 
framework  and  population  structure  of  S.  enterica  serotype  Enter¬ 
itidis  (15). 

MATERIALS  AND  METHODS 

Bacterial  isolates.  A  total  of  52  S.  enterica  serotype  Enteritidis  isolates 
were  obtained  from  the  National  Salmonella  Reference  Laboratory  at  the 
Centers  for  Disease  Control  and  Prevention  (Table  1).  Forty-nine  isolates 
were  epidemiologically  linked  to  16  outbreaks,  and  three  were  isolated 
from  sporadic  cases.  The  sporadic  isolates  were  isolated  during  a  2012 
outbreak  of  ground  beef  infection  (outbreak  D;  http://www.cdc.gov 
/salmonella/enteritidis-07  -12/). 

They  were  included  to  test  the  ability  of  a  particular  subtyping  method  to 
distinguish  between  sporadic  and  outbreak  isolates. 

WGST.  Bacterial  strains  were  grown  in  Luria  broth  at  37°C  to  the 
stationary  phase.  Genomic  DNA  was  prepared  using  a  GenElute  genomic 
DNA  isolation  kit  (Sigma-Aldrich,  St.  Louis,  MO).  WGS  was  performed  at 
TGen  North  using  Illumina  technology  (100-bp  paired-end  reads)  as  de¬ 
scribed  in  previous  studies  (16,  17).  All  WGS  data  files  were  deposited 
in  the  NCBI  Sequence  Read  Archive  (http://www.ncbi.nlm.nih.gov 
/bioproject)  under  project  number  PRINA251730.  Average  coverage  of 
sequencing  is  summarized  in  Table  S3  in  the  supplemental  material.  SNP 
detection  was  performed  similarly  to  what  was  described  in  our  previous 
study  (15).  Briefly,  trimmed  and  filtered  sequencing  reads  were  mapped 
to  a  reference  genome  (P 125 109;  GenBank  accession  no.  AM933 172.1)  to 
call  variants  (SNPs,  insertions,  and  deletions).  For  each  genome  analyzed, 
a  list  of  high-quality  SNPs  was  derived  by  subjecting  initial  SNP  calls  to  a 


TABLE  1  Isolates  used  in  this  study" 


Isolate 

Outbreak 

Epidemiologic  information 

J0900 

A 

Almonds,  CA,  2001 

J0905 

A 

Almonds,  CA,  2001 

2011K-1845 

B 

Fast  food  restaurant,  TX,  2011 

2011K-1846 

B 

Fast  food  restaurant,  TX,  2011 

H9556 

C 

Juice,  CA,  2003 

H9558 

C 

Juice,  CA,  2003 

2012K-0627 

D 

Ground  beef,  VT,  2012 

2012K-0628 

D 

Ground  beef,  VT,  2012 

2012K-0644 

D 

Ground  beef,  VT,  2012 

2012K-0738 

NA 

Sporadic  case  during  outbreak  D,  MD,  2012 

2012K-0619 

NA 

Sporadic  case  during  outbreak  D,  TX,  2012 

2012K-0597 

NA 

Sporadic  case  during  outbreak  D,  GA,  2012 

2009K-1740 

E 

Chicken,  MD,  2009 

2009K-1742 

E 

Chicken,  MD,  2009 

2010K-0338 

F 

Chili  sauce,  Mauritius,  2009 

2010K-0348 

F 

Uncooked  chicken  tikka,  Mauritius,  2009 

2010K-0351 

F 

Mauritius,  2009 

2010K-0358 

F 

Raw  chicken,  Mauritius,  2009 

2010K-0362 

F 

Mauritius,  2009 

2011K-1667 

G 

Turkish  pine  nuts,  NY,  2011 

2011K-1668 

G 

Turkish  pine  nuts,  NY,  2011 

K3308 

H 

Stuffed  chicken  products,  MN,  2006 

K3310 

H 

Stuffed  chicken  products,  MN,  2006 

K2330 

I 

OH,  2005 

K2331 

I 

OH,  2005 

2012K-0284 

J 

Elderly  care  facility,  MA,  2012 

2012K-0285 

J 

Elderly  care  facility,  MA,  2012 

2012K-0283 

I 

Elderly  care  facility,  MA,  2012 

2010K-2617 

K 

Guinea  pig,  WI,  2011 

2011K-0019 

K 

Guinea  pig,  CA,  2011 

2011K-0079 

K 

Guinea  pig,  OR,  2011 

2011K-0104 

K 

Guinea  pig,  IL,  2011 

2012K-0499 

L 

Restaurant,  NC,  2012 

2012K-0500 

L 

Restaurant,  NC,  2012 

2012K-0501 

L 

Restaurant,  NC,  2012 

2009K-1553 

M 

Eggs,  PA,  2009 

2009K-1559 

M 

Eggs,  PA,  2009 

2009K-1562 

M 

Eggs,  PA,  2009 

2010K-1946 

N 

Tall  ships,  PA,  2010 

2010K-1947 

N 

Tall  ships,  PA,  2010 

2009K-1545 

M 

Eggs,  PA,  2009 

K2082 

o 

Hospital  eggs,  GA,  2005 

K2083 

o 

Hospital  eggs,  GA,  2005 

2010K-0666 

P 

Restaurant,  CT,  2010 

2010K-0667 

P 

Restaurant,  CT,  2010 

2010K-0668 

P 

Restaurant,  CT,  2010 

2010K-0669 

P 

Restaurant,  CT,  2010 

2010K-0672 

P 

Restaurant,  CT,  2010 

2010K-0673 

P 

Restaurant,  CT,  2010 

2010K-0677 

P 

Food  worker,  CT,  2010 

2010K-0678 

P 

Food  worker,  CT,  2010 

2010K-0675 

P 

Restaurant,  CT,  2010 

a  Isolates  J0900  and  J0905  were  collected  from  the  environment;  isolates  2012K-0644, 
2010K-0338,  2010K-0348,  2010K-0358,  and  2011K-1668  were  collected  from  foods;  all 
the  other  isolates  were  collected  from  humans.  NA,  not  applicable. 


set  of  quality  filters,  including  a  minimum  Phred  base  score  of  60,  a  min¬ 
imum  read-mapping  score  of  20,  a  mapping  depth  ranging  from  5  to  100 
reads  per  locus,  and  a  maximum  alternative  allele  percentage  of  25%. 
SNPs  were  accepted  only  when  confirmed  by  reads  mapped  to  both  the 
forward  and  reverse  strands.  High-quality  SNPs  detected  from  the  con¬ 
served  genome  regions  (i.e.,  core  genome  SNPs)  among  the  52  S.  enterica 
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FIG  1  Phylogeny  and  outbreak  clusters  inferred  by  WGST.  Different  lineages  (I,  II,  III,  IV,  and  V)  are  labeled.  A  total  of  16  outbreak  clusters  (A  through  P)  are 
identified  and  labeled.  Bootstrapping  values  of  branches  leading  to  individual  outbreak  clusters  are  labeled.  The  designations  of  three  isolates  from  sporadic  cases 
(2012K-0619,  20I2K-0738,  and  2012K-0597)  are  underlined. 


serotype  Enteritidis  genomes  and  the  reference  genome  were  used  to  con¬ 
struct  a  maximum-likelihood  (ML)  tree  using  MEGA  5  ( 18).  A  similar  ML 
tree  was  built  by  further  incorporating  125  S.  enterica  serotype  Enteritidis 
and  3  Salmonella  enterica  serotype  Nitra  recently  sequenced  genomes  that 
represent  the  population  structure  of  commonly  circulating  S.  enterica 
serotype  Enteritidis  isolates  in  the  United  States  (15). 

PFGE  and  MLVA.  PFGE  (using  Xbal)  and  MLVA  were  performed 
according  to  standard  PulseNet  protocols  (19)  (http://www.cdc.gov 
/pulsenet/pathogens/).  Dendrograms  of  PFGE  and  MLVA  patterns  were 
generated  by  BioNumerics  software  (Applied-Maths,  St.-Martens-Latem, 
Belgium). 

CRISPR-MVLST.  For  each  sequenced  genome,  contigs  were  de  novo 
assembled  by  Velvet  (20).  The  sequence  of  each  marker  (CRISPR1, 
CRISPR2,  fimH,  and  sseL )  was  extracted  from  the  respective  contigs.  In¬ 
dividual  alleles  were  given  a  numeric  identifier,  as  shown  previously  (9), 
and  a  CRISPR-MVLST  sequence  type  was  determined  based  on  unique 
allelic  combinations  of  each  marker.  The  presence  of  homologous  direct 
repeats  and  duplicated  spacers  can  complicate  contig  assembly  for  the 
CRISPR  arrays.  The  majority  of  CRISPR  alleles  were  determined  using  the 
WGS  data.  For  the  few  CRISPR  sequences  where  we  were  unable  to  extract 
the  CRISPR  sequences,  we  PCR  amplified  and  sequenced  the  CRISPR 
array  as  previously  described  (12).  To  depict  the  clustering  of  subtypes 
determined  by  CRISPR-MVLST,  the  binary  distribution  (presence  or  ab¬ 
sence)  of  every  spacer  in  CRISPR  1  and  CRISPR2  and  every  SNP  in  fimH 
and  sseL  was  profiled  for  each  isolate.  Specifically,  if  a  spacer  or  a  SNP  was 
present  in  an  isolate,  it  was  designated  “1”;  otherwise,  it  was  designated 


“0.”  The  binary  distribution  patterns  of  all  isolates  were  then  combined 
and  input  into  SplitTree  (21)  to  build  a  dendrogram  by  employing  the 
unweighted-pair  group  method  using  average  linkages  (UPGMA)  algo¬ 
rithm. 

Discriminatory  power.  The  ability  to  differentiate  sampled  S.  enterica 
serotype  Enteritidis  isolates  by  the  use  of  each  subtyping  method  evalu¬ 
ated  in  this  study  was  calculated  using  Simpson’s  index  of  diversity  (22). 

RESULTS 

WGST-based  investigation  of  outbreak  and  sporadic  isolates.  A 

total  of  2,353  SNPs  were  identified  from  the  core  genome  of  the  52 
S.  enterica  serotype  Enteritidis  isolates  and  the  reference  strain. 
These  SNPs  resolved  the  cohort  of  outbreak  and  sporadic  isolates 
into  34  SNP  haplotypes  and  allowed  the  delineation  of  all  16  out¬ 
break  clusters  (Fig.  1,  clusters  A  through  P).  The  inferred  phylog¬ 
eny  of  these  isolates  was  highly  consistent  with  their  outbreak 
association.  All  but  one  outbreak  isolate  (2009K-1545)  fell  into 
their  respective  outbreak  clusters.  2009K-1545  was  considered  to 
be  associated  with  a  shelled-egg  outbreak  in  Pennsylvania  in  2009 
(outbreak  M).  However,  it  appeared  to  be  phylogenetically  more 
closely  related  to  another  outbreak  among  crew  members  of  a 
historic  sailing  ship  in  the  same  state  in  2010  (outbreak  N).  The 
three  isolates  (2012K-0619,  2012K-0738,  and  2012K-0597)  from 
sporadic  cases  that  occurred  during  the  2012  ground  beef  out- 
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TABLE  2  Comparison  of  outbreak  delineations  of  different  subtyping 
methods0 


Outbreak 

Result  by: 

WGST 

MLVA 

CRISPR-MLVST 

PFGE 

A 

+  + 

+ 

+ 

- 

B 

+  + 

+ 

+ 

+  + 

C 

+  + 

+  + 

+ 

+  + 

D 

+  + 

+  + 

+ 

+ 

E 

+  + 

+ 

+ 

+ 

F 

+  + 

+  + 

+  + 

+ 

G 

+  + 

+  + 

+  + 

+  + 

H 

+  + 

+ 

+ 

+ 

1 

+  + 

+  + 

+ 

+ 

J 

+  + 

- 

+ 

+ 

K 

+  + 

- 

- 

+ 

L 

+  + 

+  + 

+  + 

+ 

M 

+  + 

+ 

+ 

+ 

N 

+  + 

+ 

- 

+ 

O 

+  + 

+ 

- 

+ 

P 

+  + 

+ 

- 

+ 

a  Symbols  are  used  to  report  evaluations  of  subtyping  methods.  + +,  isolates  from  the 
outbreak  formed  a  cluster,  and  the  cluster  did  not  include  isolates  from  other  outbreaks 
or  sporadic  cases;  + ,  isolates  from  the  outbreak  clustered  with  each  other  but  also  with 
isolates  from  other  outbreaks  or  sporadic  cases;  — ,  isolates  from  the  outbreak  did  not 
form  a  cluster. 


break  (outbreak  D)  were  dispersed  throughout  the  tree  with  sub¬ 
stantial  phylogenetic  distances  from  the  outbreak,  indicating  their 
separate  origins  from  sources  other  than  the  contaminated 
ground  beef  (Fig.  1). 

PFGE,  MLVA,  and  CRISPR-MVLST  subtyping.  PFGE  and 
MLVA  results  are  summarized  in  Table  SI  in  the  supplemental 
material.  CRISPR-MVLST  results  are  summarized  in  Table  S2. 
Briefly,  15  different  S.  enterica  serotype  Enteritidis  sequence  types 
(ESTs)  were  identified  among  52  different  isolates  using  CRISPR- 
MLVST.  Eleven  ESTs  were  previously  observed  in  other  S.  enterica 
serotype  Enteritidis  clinical  isolates,  and  four  (EST43,  EST44, 
EST45,  and  EST46)  appeared  to  be  new  (10,  23).  The  most  fre¬ 
quent  EST  was  EST12  (17%  of  isolates;  9/52),  followed  by  EST4 
( 1 2%;  6/5 1 ) .  The  four  new  ESTs  were  designated  due  to  new  alleles 
identified  for  sseL  and  CRISPR1  (EST43),  CRISPR1  and  CRISPR2 
(EST44),  CRISPR2  (EST45),  or  sseL  (EST46). 

Comparison  of  subtyping  methods.  Analysis  of  all  S.  enterica 
serotype  Enteritidis  isolates  with  three  distinct  subtyping  methods 
(PFGE,  MLVA,  and  CRISPR-MVLST)  allowed  a  comparison  of 
their  relative  subtyping  efficacies,  which  were  benchmarked  by 
WGST  and  evaluated  by  three  criteria:  (i)  discriminatory  power, 
(ii)  delineation  of  outbreak  clusters,  and  (iii)  phylogenetic 
concordance  with  WGST. 

A  total  of  8, 18, 16,  and  34  subtypes  were  identified  from  the  52 
isolates  by  PFGE,  MLVA,  CRISPR-MVLST,  and  WGST,  respec¬ 
tively,  resulting  in  their  respective  discriminatory  powers  of  0.81, 
0.92,  0.93,  and  0.97. 

Each  of  the  16  outbreak  clusters  was  unequivocally  identified 
by  WGST;  isolates  from  each  outbreak  formed  distinct  clades  (Ta¬ 
ble  2  and  Fig.  2).  MLVA  resolved  six  outbreak  clusters  (outbreaks 
C,  D,  F,  G,  I,  and  L),  CRISPR-MLVST  identified  three  (F,  G,  and 
L),  and  PFGE  differentiated  three  (B,  C,  and  G).  For  another  eight 
outbreak  clusters,  MLVA  was  able  to  cluster  the  corresponding 
isolates,  but  the  clusters  did  not  definitively  exclude  other  isolates. 


Similarly,  9  and  12  outbreaks  were  inconclusively  clustered  by 
CRISPR-MLVST  and  PFGE,  respectively.  Isolates  from  two  out¬ 
break  clusters,  four  outbreak  clusters,  and  one  outbreak  cluster 
failed  to  cluster  by  MLVA,  CRISPR-MLVST,  and  PFGE,  respec¬ 
tively  (Table  2  and  Fig.  2). 

While  CRISPR-MVLST,  MLVA,  and  PFGE  are  not  intended 
for  phylogenetic  inference,  CRISPR-MVLST  correctly  identified 
all  four  major  lineages  defined  by  WGST  (Fig.  2). 

DISCUSSION 

The  exceptional  performance  of  WGST  in  the  fine-scale  delinea¬ 
tion  of  outbreaks  of  infectious  disease  has  been  demonstrated  in 
recent  investigations  (24-31).  In  the  current  study,  we  expanded 
the  evaluation  of  WGST  by  retrospectively  investigating  isolates 
from  15  recent  S.  enterica  serotype  Enteritidis  outbreaks  in  the 
United  States  and  1  in  Mauritius.  This  collection  of  isolates  repre¬ 
sents  the  known  phylogenetic  diversity  and  epidemiological  prev¬ 
alence  of  commonly  circulating  S.  enterica  serotype  Enteritidis 
lineages  in  the  United  States  in  recent  years  as  previously  surveyed 
in  reference  15,  therefore  providing  a  realistic  assessment  of 
WGST  in  discriminating  this  otherwise  difficult-to-subtype 
pathogen.  With  the  exception  of  2009K-1545  (discussed  below), 
WGST  was  able  to  unequivocally  discriminate  each  particular 
outbreak  cluster  by  exclusively  assigning  outbreak  isolates  to  it. 

Three  sporadic  strains  (2012K-0619,  2012K-0738,  and  2012K- 
0579)  were  isolated  during  a  multistate  outbreak  linked  to  ground 
beef  in  2012  and  found  to  display  a  PFGE  pattern  indistinguish¬ 
able  from  that  of  the  outbreak  strain.  Both  MLVA  and  CRISPR- 
MVLST  separated  them  from  temporally  related  outbreak  D  iso¬ 
lates  (Fig.  2),  and  WGST  was  further  able  to  identify  these  isolates 
as  epidemiologically  unrelated  to  this  and  any  outbreak  as  well  as 
to  each  other  as  shown  in  Fig.  1. 

WGST  also  indicated  that  outbreak  M  might  have  been  poly¬ 
clonal  (i.e.,  that  multiple  strains  might  have  been  involved  in  the 
same  outbreak),  as  a  previously  identified  outbreak  isolate 
(2009K-1545)  fell  outside  the  major  outbreak  cluster,  which  was 
also  shown  by  the  CRISPR-MVLST  result  (Fig.  2;  see  also  Table  S2 
in  the  supplemental  material).  Interestingly,  WGST  suggested  that 
2009K-1545  was  phylogenetically  close  to  outbreak  N,  which  was 
temporally  and  geographically  related  to  outbreak  M  (Pennsylva¬ 
nia,  2009  to  2010).  Therefore,  some  isolates  from  the  two  out¬ 
breaks  may  have  originated  from  a  recent  common  ancestor, 
which  is  consistent  with  the  fact  that  the  patterns  of  the  outbreak 
M  and  N  isolates  were  indistinguishable  by  MLVA  and  PFGE  (Fig. 
2).  Together,  these  results  suggest  that  WGST  makes  a  superior 
subtyping  tool  that  can  reliably  define  S.  enterica  serotype  Enteri¬ 
tidis  outbreak  clusters  in  the  epidemiological  setting  of  recent  S. 
enterica  serotype  Enteritidis  outbreaks  in  the  United  States. 

The  ability  of  WGST  to  concurrently  provide  superior  discrim¬ 
inatory  power  and  accurate  phylogenetic  inferences  has  the  poten¬ 
tial  to  bridge  outbreak  investigations  with  long-term  and  large- 
scale  epidemiological  studies.  WGST  defines  outbreaks  by 
resolving  phylogenetic  relationships  rather  than  by  targeting  hy¬ 
pervariable  but  phylogenetically  uninformative  markers.  This 
provides  information  regarding  the  evolutionary  dynamics  and 
population  structure  of  the  pathogen,  which,  in  turn,  can  help 
increase  understanding  of  the  patterns  and  trends  of  its  distribu¬ 
tion  and  infection.  To  further  demonstrate  the  robustness  of 
WGST  in  delineating  outbreak  clusters  among  closely  related  S. 
enterica  serotype  Enteritidis  isolates,  including  those  of  the  same 
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FIG  2  Clustering  of  outbreak  isolates  by  MLVA,  CRISPR-MVLST,  and  PFGE.  Outbreaks  are  labeled  A  through  P  according  to  Table  1  data.  Outbreaks  that 
included  isolates  not  clustered  together  are  labeled  with  a  single  asterisk  (*)  and  indicated  by  dashed  lines.  The  designations  of  three  isolates  from  sporadic  cases 
(2012K-0619,  2012K-0738,  and  2012K-0597)  are  underlined.  The  lineages  to  which  each  isolate  belonged  are  also  labeled.  These  dendrograms  are  intended  to 
show  the  hierarchical  clustering  of  isolates,  and  their  branch  lengths  are  not  comparable  between  the  different  methods. 


PFGE  patterns,  we  included  a  total  of  125  previously  sequenced  S. 
enterica  serotype  Enteritidis  isolates  from  a  phylogenetic  and  epi¬ 
demiologic  survey  of  this  serotype  (15).  As  shown  in  Fig.  SI  in  the 
supplemental  material,  all  the  outbreak  isolates  analyzed  in  the 
current  study  formed  distinct  clusters  (highlighted  in  red  in  Fig. 
S 1 )  consistent  with  their  epidemiological  information  in  the  back¬ 
ground  of  the  additional  isolates,  including  the  ones  with  the  same 
PFGE  patterns  as  some  of  the  outbreak  isolates  (highlighted  in 
blue  in  Fig.  SI).  Only  one  previously  sequenced  isolate  (02-2966) 
grouped  within  an  outbreak  cluster  (outbreak  K,  a  20 1 1  multistate 
outbreak  associated  with  guinea  pigs).  Isolate  2-2966  was  col¬ 
lected  from  a  rodent  in  California  in  2002,  with  its  PFGE  pattern 
unknown. 

The  isolates  investigated  in  the  current  study  fell  within  four  of 
the  five  previously  defined  lineages  (15)  when  they  were  incorpo¬ 
rated  into  the  previous  phylogeny  (see  Fig.  SI  in  the  supplemental 
material).  Furthermore,  the  epidemiological  information  and 
phylogenetic  distribution  for  the  newly  sequenced  strains  corre¬ 
sponded  with  the  geographic  characteristics  and  observed  preva¬ 
lence  of  the  five  lineages.  Specifically,  the  isolates  from  outbreaks 
A  and  B  were  from  California  and  Texas,  respectively,  which  is 
consistent  with  their  clustering  in  a  major  clade  of  lineage  I  mainly 
consisting  of  western  American  isolates  (15).  J0900  and  J0905 
were  isolated  from  environmental  sources,  similarly  to  other  iso¬ 
lates  in  this  clade  that  were  predominately  associated  with  envi¬ 


ronmental  origins.  Whereas  none  of  the  American  isolates  sur¬ 
veyed  in  the  current  study  clustered  in  lineage  III,  a  lineage 
characteristic  of  its  international  spread,  the  majority  of  them  (39 
of  52  isolates;  10  of  the  16  outbreak  clusters)  were  found  in  lineage 
V,  a  typical  domestic  lineage  often  associated  with  poultry  prod¬ 
ucts.  Lineage  IV  was  represented  by  only  one  isolate  and  was  con¬ 
sidered  to  be  rare  or  undersampled  in  the  previous  study  (15).  It 
remained  the  least  sampled  lineage  in  the  current  study,  with  three 
isolates  (2012K-0738,  2009K-1740,  and  2012K-1742).  All  the  iso¬ 
lates  identified  in  lineage  IV  so  far  were  isolated  in  Maryland. 
Interestingly,  less-sampled  lineage  II,  which  was  previously  recog¬ 
nized  as  a  population  associated  with  marine  mammals  in  Cali¬ 
fornia,  was  found  to  also  include  isolates  from  outbreak  and  spo¬ 
radic  cases  widespread  on  the  west  coast  (California;  outbreak  C), 
east  coast  (Vermont;  outbreak  D),  and  Gulf  coast  (Texas;  2012K- 
0619).  It  was  hypothesized  that  free-ranging  and  migratory  ma¬ 
rine  mammals  and  the  birds  that  share  their  habitats  could  poten¬ 
tially  play  a  role  in  long-distance  dispersal  of  this  pathogen 
(15). While  CRISPR-MVLST  was  able  to  delineate  major  lineages, 
it  was  not  possible  to  reveal  such  patterns  by  MLVA  and  PFGE. 

Most  comparative  studies  of  different  molecular  subtyping 
schemes  have  focused  on  performance  parameters  such  as  dis¬ 
criminatory  power  and  subtype  correlation  (32,  33).  This  ap¬ 
proach  is  sometimes  confounded  by  the  limited  resolution  of 
common  subtyping  markers  and/or  lack  of  coherence  between 
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them,  as  dictated  by  their  inherent  differences  in  mutation  rates 
and  evolutionary  history.  PFGE  has  been  used  as  a  standard  to 
facilitate  comparison,  but  it  is  not  ideal,  especially  for  genetically 
homogenous  organisms  such  as  S.  enterica  serotype  Enteritidis. 
Using  WGS  to  benchmark  molecular  subtyping  enables  more- 
rigorous  evaluation  by  interrogating  subtypes  defined  by  particu¬ 
lar  methods  with  unparalleled  resolution  and  superior  phyloge¬ 
netic  accuracy.  In  the  present  study,  the  side-by-side  comparison 
of  commonly  used  and  recently  developed  subtyping  methods  for 
S.  enterica  serotype  Enteritidis  was  guided  by  the  robust  subtyping 
and  phylogenetic  inference  of  WGST.  This  allowed  a  thorough 
evaluation  of  the  relative  performances  of  MLVA  and  CRISPR- 
MVLST  that  was  otherwise  not  possible. 

For  instance,  CRISPR-MVLST  was  outperformed  by  MLVA  in 
outbreak  cluster  delineation  but  was  able  to  resolve  each  of  the 
major  lineages.  We  also  observed  that  it  was  the  CRISPR  compo¬ 
nents,  rather  than  the  virulence  genes,  in  the  CRISPR-MVLST 
scheme  that  afforded  the  differentiation  of  the  lineages  (see  Table 
S2  in  the  supplemental  material).  Originated  from  phages  and 
plasmids  that  might  be  characteristic  of  particular  environments 
(34),  CRISPRs  might  capture  signals  of  ecological  relevance. 
Given  the  dynamic  nature  of  CRISPR  loci  with  respect  to  spacer 
acquisition,  loss,  and  duplication,  we  hypothesized  that  the  iden¬ 
tification  of  major  S.  enterica  serotype  Enteritidis  lineages  by 
CRISPR-MVLST  was  due  to  the  imprinting  of  exogenous  genetic 
cues  on  the  CRISPRs  that  reflect  the  different  ecological  origins  of 
major  lineages.  However,  a  recent  study  that  included  various  Sal¬ 
monella  serotypes  suggested  that  such  signals  might  not  be  phylo- 
genetically  informative  at  the  species  level  due  to  factors  such  as 
horizontal  gene  transfer  and  acquisition  of  common  CRISPRs  by 
different  lineages  (35).  Further  studies  are  necessary  to  investigate 
the  robustness  and  scope  of  CRISPR  subtyping  in  detecting  eco¬ 
logical  and  evolutionary  patterns  of  Salmonella  and  other  organ¬ 
isms. 

It  is  anticipated  that  WGS  will  eventually  become  the  new  gold 
standard  for  microbial  pathogen  subtyping.  Ongoing  efforts  such 
as  the  100K  Genome  Project  (http://100kgenome.vetmed.ucdavis 
.edu/),  GenomeTrakr  Network  (http://www.fda.gov/Food/Food 
ScienceResearch/WholeGenomeSequencingProgramWGS/),  Global 
Microbial  Identifier  (http://www.globalmicrobialidentifier.org/), 
and  Advanced  Molecular  Detection  (http://www.cdc.gov/amd/) 
are  creating  a  vast  resource  of  microbial  genomes  and  piloting  the 
real-time,  WGS-based  surveillance  of  microbial  pathogens.  In¬ 
stead  of  viewing  WGS  as  the  ultimate  tool  that  will  soon  spell  the 
end  of  other  sub  typing  methods,  we  recommend  using  WGS  as  a 
comprehensive  platform  that  will  provide  access  to  all  existing  and 
future  genetic  markers  for  subtyping.  For  example,  CRISPRs  and 
virulence  genes  were  retrieved  from  sequencing  data  in  the  present 
study,  and  tools  using  WGS  for  other  sub  typing  schemes  have 
been  developed  (36,  37).  Additionally,  WGS  provides  a  wealth  of 
genomic  data  for  interrogation  for  additional  features  beyond 
phylogenetic  analysis,  including  gene  content  (e.g.,  antibiotic  re¬ 
sistance  and  virulence  genes),  accessory  genome  changes  (e.g., 
plasmids  and  genomic  islands),  and  the  presence  of  phenotypi- 
cally  relevant  SNPs  (e.g.,  nonsynonymous  and  regulatory  effec¬ 
tors).  The  incorporation  of  various  subtyping  methods  into  the 
WGS  platform  will  provide  both  backward  compatibility  to  exist¬ 
ing  markers  and  data  and  extensibility  to  newly  developed 
schemes,  thus  facilitating  the  evaluation  and  benchmarking  of 
molecular  subtyping. 


In  this  study,  we  evaluated  only  Illumina  sequencing  technol¬ 
ogy.  Comparisons  of  different  sequencing  platforms  have  been 
reported  elsewhere  (38,  39).  Also,  we  did  not  attempt  to  address 
the  important  issue  of  routine  and  broad  implementation  of  WGS 
in  clinical  and  public  health  applications,  which  has  recently  been 
investigated  (40). 
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